Search CORE

187 research outputs found

Hiding Outliers in HighDimensional Data Spaces

Author: Böhm Klemens
Steinbuß Georg
Publication venue: Karlsruher Institut für Technologie
Publication date: 01/01/2017
Field of study

Detecting outliers in high-dimensional data is crucial in many domains. Due to the curse of dimensionality, one typically does not detect outliers in the full space, but in subspaces of it. More specifically, since the number of subspaces is huge, the detection takes place in only some subspaces. In consequence, one might miss hidden outliers, i.e., outliers only detectable in certain subspaces. In this paper, we take the opposite perspective, which is of practical relevance as well, and study how to hide outliers in high-dimensional data spaces. We formally prove characteristics of hidden outliers. We also propose an algorithm to place them in the data. It focuses on the regions close to existing data objects and is more efficient than an exhaustive approach. In experiments, we both evaluate our formal results and show the usefulness of our algorithm using di↵erent subspace selection schemes, outlier detection methods and data sets

Crossref

KITopen

Scenario Discovery via Rule Extraction

Author: Arzamasov Vadim
Böhm Klemens
Publication venue
Publication date: 03/10/2019
Field of study

Scenario discovery is the process of finding areas of interest, commonly referred to as scenarios, in data spaces resulting from simulations. For instance, one might search for conditions - which are inputs of the simulation model - where the system under investigation is unstable. A commonly used algorithm for scenario discovery is PRIM. It yields scenarios in the form of hyper-rectangles which are human-comprehensible. When the simulation model has many inputs, and the simulations are computationally expensive, PRIM may not produce good results, given the affordable volume of data. So we propose a new procedure for scenario discovery - we train an intermediate statistical model which generalizes fast, and use it to label (a lot of) data for PRIM. We provide the statistical intuition behind our idea. Our experimental study shows that this method is much better than PRIM itself. Specifically, our method reduces the number of simulations runs necessary by 75% on average

arXiv.org e-Print Archive

Revealing the Suitability of Incentive Mechanisms for the Collaborative Creation of Structured Knowledge

Author: Böhm Klemens
Kühne Conny
Publication venue: Karlsruher Institut für Technologie
Publication date: 01/01/2011
Field of study

KITopen

Auction-based traffic management : towards effective concurrent usage of road intersections

Author: Böhm Klemens
Schepperle Heiko
Publication venue: Universität Karlsruhe (TH)
Publication date: 01/01/2008
Field of study

KITopen

Improved bibliographic reference parsing based on repeated patterns

Author: Böhm Klemens
Sautter Guido
Publication venue
Publication date: 30/04/2014
Field of study

uploaded by Plaz

ZENODO

A combining approach to find all taxon names (FAT)

Author: Agosti Donat
Böhm Klemens
Sautter Guido
Publication venue: 'The University of Kansas'
Publication date: 01/01/2006
Field of study

Most of the literature on natural history is hidden in millions of pages stacked up in our libraries. Various initiatives aim now at making these publications digitally accessible and searchable, applying xml-mark up technologies. The unique biological names play a crucial role to link content related to a particular taxon. Thus discovering and marking them up is extremely important. Since their manual extraction and markup is cumbersome and time-intensive, it needs be automated. In this paper, we present computational linguistics techniques and evaluate how they can help to extract taxonomic names auto-matically. We build on an existing approach for extraction of such names (Koning et al. 2005) and combine it with several other learning techniques. We apply them to the texts sequentially so that each technique can use the results from the preceding ones. In particular, we use structural rules, dynamic lexica with fuzzy lookups, and word-level language recognition. We use legacy documents from different sources and times as test bed for our evaluation. The experimental results for our combining approach (FAT) show greater than 99% precision and recall. They reveal the potential of computational linguis-tics techniques towards an automated markup of biosystematics publications

Crossref

Directory of Open Access Journals

The University of Kansas: Journals@KU

Biodiversity Informatics

A Comprehensive Study of k-Portfolios of Recent SAT Solvers

Author: Bach Jakob
Böhm Klemens
Iser Markus
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik GmbH
Publication date: 30/07/2022
Field of study

Hard combinatorial problems such as propositional satisfiability are ubiquitous. The holy grail are solution methods that show good performance on all problem instances. However, new approaches emerge regularly, some of which are complementary to existing solvers in that they only run faster on some instances but not on many others. While portfolios, i.e., sets of solvers, have been touted as useful, putting together such portfolios also needs to be efficient. In particular, it remains an open question how well portfolios can exploit the complementarity of solvers. This paper features a comprehensive analysis of portfolios of recent SAT solvers, the ones from the SAT Competitions 2020 and 2021. We determine optimal portfolios with exact and approximate approaches and study the impact of portfolio size k on performance. We also investigate how effective off-the-shelf prediction models are for instance-specific solver recommendations. One result is that the portfolios found with an approximate approach are as good as the optimal solution in practice. We also observe that marginal returns decrease very quickly with larger k, and our prediction models do not give way to better performance beyond very small portfolio sizes

KITopen

Automatic Generation of Optimized Process Models from Declarative Specifications

Author: Böhm Klemens
Mrasek Richard
Mülle Jutta
Publication venue: Karlsruher Institut für Technologie
Publication date: 01/01/2014
Field of study

Process models often are generic, i. e., describe similar cases or contexts. For instance, a process model for commissioning can cover both vehicles with an automatic and with a manual transmission, by executing alternative tasks. A generic process model is not optimal compared to one tailored to a specific context. Given a declarative specification of the constraints and a specific context, we study how to automatically generate a good process model and propose a novel approach. We focus on the restricted case that there are not any repetitions of a task, as is the case in commissioning and elsewhere, e. g., manufacturing. Our approach uses a probabilistic search to find a good process model according to quality criteria. It can handle complex real-world specifications containing several hundred constraints and more than one hundred tasks. The process models generated with our scheme are superior (nearly twice as fast) to ones designed by professional modelers by hand

KITopen